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I .   INTRODUCTION 
A.   Background 

Factor  analysis  is  a  frequently  used  model  building  technique, 
especially  in  sciences  where  a  large  number  of  variables  need  to  be 
studied.   Unfortunately,  little  work  has  been  done  on  ways  of  testing  the 
goodness  of  fit  of  the  model  to  the  data.   Several  techniques  for  testing 
this  goodness  of  fit  have  been  evaluated  in  this  investigation.   In 
addition,  these  techniques  were  used  to  evaluate  the  maximum  likelihood 
estimation  procedure  as  a  factor  analytic  method. 

For  many  years,  factor  analysis  has  been  used  as  a  research  tool  for 
finding  the  major  or  meaningful  influences  on  a  set  of  variables.   Many 
different  methods  for  finding  these  influences  (or  factors)  have  been 
proposed  and  used  (Harman,  1967),  but  most  are  strictly  non-inferential 
in  nature.   That  is,  they  treat  the  observed  correlations  as  though  they 
are  population  values,  and  the  resulting  factor  loadings  are  calculated 
directly  from  the  sample  correlations.   Thus,  the  statistical  problems 
of  the  sampling  of  individuals  are  ignored,  and  results  are  usually 
considered  as  though  they  are  population  values.   On  the  other  hand,  the 
method  of  maximum-likelihood  estimation  provides  a  statistical  procedure 
for  the  estimation  of  the  parameters  of  the  factor  analytic  model,  based 
on  the  assumption  that  the  original  variables  follow  a  multivariate 
normal  distribution. 

The  basic  assumption  of  factor  analysis  is  that 

x  =  Af  +  u  +  e 

where  x  is  a  random  vector  of  p  variables,  f  is  a  random  vector  of  m  <  p 
common  factor  scores,  e  is  a  random  vector  of  p  uniquenesses,  A  is  a  p  x  m 


matrix  of  factor  loadings,  and  u  is  a  p  vector  of  means.   It  is  further 
assumed  that  E(f)  =  0,  E(e)  =  0,  E(ff')  =  I,  E(ee')  =  D2,  and  E(fe')  =  0. 
This  model  is  usually  written 

P  =  AA'  +  D2 

where  P  is  the  correlation  matrix.   Here,  x  is  assumed  to  be  multivariate 
normal  and  the  rows  of  A  are  proportional  to  the  rows  of  A  as  Lawley  (19^0) 
has  shown  that  the  results  are  independent  of  the  scale  of  measurement. 


i 


B.   Algebraic  Methods  of  Factor  Analysis 

""""     

The  principal  axis  method  of  factor  analysis,  developed  by  Hotelling 


i 


(1933),  is  today  probably  the  most  commonly  used  factor  analytic  technique. 
Its  basic  objective  is  to  determine  factors  which  account  for  the- -maximum 


amount  of  variance  of  the  observed  variables.   The  first  factor  is  that 
linear  combination  of  the  original  variables  which  accounts  for  maximum 
variance.   Each  subsequent  factor  accounts  for  a  maximum  amount  of  the 
remaining  variance,  while  remaining  uncorrelated  with  all  previous  factors. 
Thus,  the  factors  are  derived  by  an  algebraic  rule  from  the  sample 

correlations,  and  can  be  strongly  influenced  by  sampling  variability. 

1/2 
The  factors  are  determined  by  the  matrix  equation  A  =  V  A     where  V 

contains  the  normalized  eigenvectors  of  the  sample  correlation  matrix  as 

columns,  and  A  is  a  diagonal  matrix  containing  the  eigenvalues,  i.e., 

R  =  V  A  V  . 

Minres  (minimum  residual)  factor  analysis  (Harman  and  Jones,  1966) 

is  another  algebraic  approach  to  the  problem  of  obtaining  factors.   Its 

aim  is  the  best  possible  reproduction  of  the_ observed  correlations,  where 

best  is  defined  in  terms  of  a  least  squares  fit.   Thus  the  factor  matrix  A 

is  determined  such  that  the  sum  of  squares  of  the  off  diagonal  elements 

of  R  -  AA'  is  minimized. 


C.   Maximum  Likelihood  Factor  Analysis 

In  contrast  to  the  algebraic  techniques  mentioned  previously,  the 
method  of  maximum-likelihood  estimation  requires  the  writing  down  of 
the  density  function  of  the  observations,  given  the  population  parameters. 
Then  the  sample  values  are  considered  as  fixed,  and  the  parameters  are 
considered  as  the  variables.   The  resulting  function,  called  the  likelihood 
function,  is  then  maximized  with  respect  to  the  parameters.   The  values 
of  the  parameters  which  maximize  the  likelihood  function  are  termed  the 
maximum-likelihood  estimates  of  the  population  parameters  (Lehmann,  1959)- 

The  possibility  of  the  use  of  the  maximum-likelihood  method  for  the 
estimation  of  factor  loadings  has  existed  for  at  least  thirty  years 
(Lawley,  19^+0).   The  method  has  always  required  an  iterative  procedure 
with  a  large  number  of  calculations  to  be  performed  on  each  iteration. 
Thus  it  seemed  natural  that  the  development  of  computers  would  encourage 
the  use  of  the  method,  but  Lawley  and  Maxwell  (1963)  reported  that  in  some 
cases  convergence  of  the  likelihood  function  to  its  maximum  was  a  very 
slow  process  or  might  not  even  be  attained,  unless  good  initial  estimates 
for  the  factor  loadings  were  used. 

Joreskog  (19&7  a  and  b)  has  developed  a  new  computational  method  which 
has  the  advantage  that  the  iterative  procedure  always  converges.   In  other 
papers,  Joreskog  (1969,  1970)  has  extended  the  maximum-likelihood  estimation 
method  to  cover  a  wide  variety  of  models,  including  factor  analytic  ones. 
In  conjunction  with  these  efforts,  Joreskog,  Gruvaeus,  and  van  Thillo  (1970) 
have  developed  a  general  computer  program  to  calculate  maximum-likelihood 
estimates.   The  maximum-likelihood  estimation  procedure  also  provides  a 
likelihood-ratio  test  of  the  number  of  factors,  and  this  test  has  been  made 
available  in  their  general  computer  program. 


In  applying  the  method  of  maximum  likelihood  to  the  general  factor 
analytic  model  (P  =  AA'  +  D  ),  one  writes  the  log  of  the  likelihood  function, 
omitting  a  function  solely  of  the  observations  (Lawley  and  Maxwell,  1963) 

log  L  =  -f  [log   |P|  +  tr(RP-1)] 

where  R  is  a  sample  correlation  matrix  based  on  a  sample  of  size  n  +  1. 
This  expression  is  then  maximized  with  respect  to  the  elements  of  the 
matrices  A  and  D,  to  obtain  a  maximum-likelihood  solution. 

Since  the  method  of  maximum-likelihood  estimation  is  an  established 
statistical  procedure,  it  is  desirable  to  see  how  well  it  does  in  practice. 
Browne  (1968)  has  already  shown  that  maximum-likelihood  estimates  are 
preferable  to  many  other  types  when  dealing  with  sample  correlation 
matrices  drawn  from  populations  which  exactly  satisfy  the  factor  analytic 
model,  but  it  remains  to  be  seen  how  well  it  will  work  with  data  more 
like  real  data. 

D.   The  Tucker,  Koopman,  and  Linn  Study 

Tucker,  Koopman,  and  Linn  (1969)  generated  5U  population  correlation 
matrices  in  order  to  study  factor  analytic  methods.   One  of  their  simulated 
correlation  matrices  was  defined  by 

R  -  VA  +  B2P2B£  ♦  B3P3B3 

where  B  ,  B  ,  and  B  were  diagonal  matrices  with  real  positive  diagonal 
elements  b   ,  b   ,  and  b    (j  =  1,  2,...,p,  the  number  of  variables), 

-'-J       '—J  -J  J 

2 
respectively.   Since  b  .   was  the  proportion  of  variance  of  variable  j 

2 
due  to  the  major  factors,  b   the  proportion  due  to  the  minor  factors,  and 

2 
b„  .  the  proportion  due  to  the  unique  factors  they  had: 

2     2     2 
b,  .  +  \>t.   +  b,  .  =1. 
lj    2j    3j 


1      2 
Thus,  b   +  b^,.  equaled  the  communality  (common  variance)  of  variable  j. 

Three  different  relationships  between  these  coefficients  defined  the  three 

types  of  population  matrices  used  by  Tucker  et  al .   Correlation  matrices 

2  2  2 

with  b_ .  =  0  and  b   =  (l  -  b   )  exactly  fitted  the  mathematical  factor 

2 
analytic  model  (common  factors  +  uniquenesses).   With  b   =  0  and 

-^  J 

2  2 

b_ .  =  (l  -  b   ),  the  correlation  matrices  constituted  the  simulation 

model,  as  P  contained  the  accumulated  effect  of  180  minor  factors. 
It  was  hoped  that  these  simulation  matrices  would  approximate  real  data 
population-correlation  matrices  which  could  be  thought  of  as  arising  from 
a  few  major  and  many  minor  influences.   They  also  employed  a  third  model 

which  contained  influences  of  both  minor  factors  and  uniquenesses.   For 

2     2     1  -  b2 
this  middle  model,  they  had  b   =  b  .  =  lj 

All  correlation  matrices  contained  20  variables.   Each  P  matrix 

s 

(s  =  1,2,3)  was  constructed  from  the  relationship  P  =  A*  A* '  where 

s    s   s 

A*  was  obtained  by  adjusting  the  rows  of  A  to  be  of  unit  length, 
s  s 

The  A  matrices  were  generated  by  random  processes  and  contained 
either  three  or  seven  columns,  representing  the  number  of  major  factors 
in  each  correlation  matrix.   The  A  matrices  were  generated  by  another 
random  process  so  that  the  effect  of  V     was  as  though  there  were  180  minor 
factors  in  it.   P  was  an  identity  matrix,  as  the  factor  analytic  model 
assumes  a  unique  factor  for  each  variable,  and  that  these  unique  factors 
are  uncorrelated.   Tucker  et  al  also  used  three  levels  of  entries  in  the 
B  matrices;  hi  (.6,  .7,  .8),  wide  (.2,  .3,  A,  .5,  -6,  .7,  .8),  and  low 
(.2,  .3,  .h) .   Thus  their  design  was  three  (models)  x  two  (number  of  major 
factors)  x  three  (levels  of  B  coefficients),  and  they  generated  three 
correlation  matrices  for  each  of  the  eighteen  cells.   Tucker,  Koopman, 
and  Linn  were  interested  in  comparing  several  factor  analytic  techniques, 


but  the  data  they  generated  are  useful  for  studying  any  procedures 
related  to  factor  analysis. 

In  the  Tucker,  Koopman,  and  Linn  (1969)  study,  the  authors  used 
a  random  process  to  generate  conceptual  input  factor  loadings  K     for  the 
major  factor  domain.   They  combined  these  with  random  normal  deviates, 
applied  a  skewing  function,  and  multiplied  by  the  matrices  B  ,  in  order 
to  get  to  actual  input  factor  loadings  A  .   (A  =  B  A*  where  A*  A* '  =  P  ). 
The  authors  used  joint  rotations  of  actual  input  factors  with  output 
factors,  and  also  rotations  of  output  factors  only,  to  assess  the  degree 
to  which  actual  input  factors  were  found  on  output.   Thus  there  were  two 
methods  of  comparison  used,  and  each  resulted  in  a  separate  index  (coeffi- 
cient of  congruence)  for  each  actual  input  factor. 

Although  the  raw  data  of  the  Tucker,  Koopman,  and  Linn  study  consisted 
of  population-correlation  matrices  and  not  samples,  some  of  their  results 
can  serve  as  standards  for  some  of  the  results  of  the  current  study.   In 
general,  the  reproduction  of  the  actual  input  factors  in  the  output  factors 

was  very  good  for  the  formal  model,  and  poorer  for  the  simulation  model. 

2 
The  reproduction  was  good  with  a  high  level  of  b  .  and  poorer  for  a  low 

-*~  J 

level.   Thirdly,  results  were  better  for  three  factors  than  for  seven. 

2 
Finally,  the  combination  of  simulation  model,  low  b  .,  and  seven  factors 

produced  extremely  poor  results.   These  results  led  Tucker  et  al  to  conclude 

that  the  quality  of  factor  analytic  results  depended  heavily  on  the  design 

and  conduct  of  the  study. 

E.   Goodness  of  Fit 


The  set  of  factor  analytic  methods  can  be  divided  into  two  parts: 
exploratory  methods  which  are  used  in  early  investigations  in  an  area, 
with  the  purpose  of  reducing  a  large  number  of  variables  to  a  smaller 


number  of  factors  when  the  investigator  has  no  a  priori  hypotheses  as  to 
the  composition  of  the  factors;  and  confirmatory  methods  which  are  used 
by  investigators  with  specifiable  hypotheses  about  the  factors.   The 
present  study  considers  confirmatory  factor  analysis  only,  and  a  major 
interest  is  in  the  discovery  or  development  of  a  measure  which  would 
reflect  the  degree  of  fit  of  the  final  solution  to  the  specified 
hypothesis.   It  is  possible  to  test  this  hypothesis  via  the  likelihood- 
ratio  technique,  and  although  the  distribution  of  the  likelihood-ratio 

2 
statistic  has  not  been  tabled,  it  is  distributed  approximately  as  a  x 

in  large  samples  (Lawley  and  Maxwell,  1963).   Unfortunately,  this  test 

sets  up  the  hypothesis  as  a  null  hypothesis,  and  as  the  sample  size 

increases,  it  is  more  likely  to  be  rejected,  as  no  hypothesis  is  exactly 

/true.   Thus,  this  test  is  of  little  use  to  many  researchers  who  are 

^interested  in  how  well  their  data  agree  with  their  model,  fully  realizing 

that  their  model  cannot  be  exactly  true  in  the  population.   Therefore  what 

is  needed  is  a  measure  to  assess  the  goodness  of  fit  of  the  model  to  the 

data.   Thus,  the  problem  in  this  study  is  different  from  the  one  considered 

by  Tucker,  Koopman,  and  Linn  as  they  were  interested  in  factor  matching,  while 

here,  the  aim  is  to  have  one  index  to  measure  the  total  goodness  of  fit. 

Tucker  (personal  communication)  has  suggested  a  measure 


2 

(*--  1) 

vdf    ; 

-1           m 
plm  =  1  ~  — 

(*--  1) 

df     0 

2 
where  \   /  d.f.    is  the  c hi- square  approximate  test   criterion  for  the 

likelihood-ratio  test    statistic   divided  by  its  degrees  of  freedom,   taken 

after   zero   factors  and  after     m     factors  have  been  extracted.      This  measure 


is  analogous  to  a  percent  of  variance  accounted  for  by  the  model,  as  the 

2 
expected  value  of  a  x  random  variable  divided  by  its  degrees  of  freedom 

is  one.   More  recently,  Tucker  and  Lewis  (1970)  have  developed  a  second 

reliability  coefficient, 


Un   -  M 
0    m 
P, 


2*    M.  -  1 


0   n' 

m 

1  2 

where  n'   =  N  -  1  -  7  (2p  +  5)  -  Tm5  P  =  number  of  variables,  M  =  F  /df  , 
m  6  3  mmm 

F  =  minimum  value  of  F   (A,  D)  =  log   |p|  +  tr(RP_  )  -  log   |r|  -  P 
m  m  °e  '  '  °e  '  ' 

(Jb'reskog,  1967b)  for  m  factors,  and  df  =  degrees  of  freedom  for  m  factors. 

It  was  hoped  that  this  coefficient  would  be  independent  of  the  sample  size 

and  would  provide  an  estimate  of  the  goodness  of  fit  of  the  factor 

analytic  model  in  the  population.   Tucker  and  Lewis  calculated  p  for  the 

number  of  major  factors  for  some  of  the  population-correlation  matrices 

of  Tucker,  Koopman,  and  Linn.   These  values  (Table  l)  can  serve  as  targets 

for  the  current  study.   These  two  measures  (pn   and  p_  )  are  similar  (as 

lm      2m 

2 
can  be  seen  by  substituting  Y   =  n'   F   in  p.   ,  but  not  identical.   It 

m     mm     lm 

is  hoped  that  one  or  both  of  them  are  good  indicators  of  goodness  of  fit 
for  maximum-likelihood  factor  analysis. 


Table  1 

Values  of  p_  Obtained  by  Tucker  and  Lewis  from  Eight  of  the  Tucker, 
Koopman,  and  Linn  Population  Correlation  Matrices   (N  =  °°,  p  =  20). 

3  Factors  in  Major  Domain,  Reliabilities  for  3  Common  Factor  Models 

Formal  Model       Simulation  Model 
high  b2.  1.00  .83 

lowblj  1.00  .55 

7  Factors  in  Major  Domain,  Reliabilities  for  7  Common  Factor  Models 

Formal  Model       Simulation  Model 

high  b2  1.00  .71 

lj 

low  b?.  1.00  .1+8 


Another  possible  measure  of  goodness  of  fit  is  the  sum  of  squares  of 
differences  between  the  correlations  implied  by  the  model  and  those 
reproduced  by  the  actual  output  factors.   Browne  (1968)  suggested  this 
measure  of  goodness  of  fit: 

pi  p 

c,   =  Z    Z   [**'  -  AAf  ]  . 

1  i=l  j=l  1J 

where  A  is  the  sample  factor  matrix,  $  is  the  population  factor  matrix, 
and  p  is  the  number  of  variables.   Of  course,  another  possibility  is  to 
exclude  the  diagonal  elements.   This  would  emphasize  reproduction  of  the 
correlations,  while  ignoring  the  communalities: 

P   i-1  P 

c0  =  Z    Z   [$$'  -  AA'] •• 

2  i=2  j=l  1J 


10 


Both  measures  were  scaled  by  the  total  sum  of  squares  in  order  to  produce 
coefficients,  r  and  r  ,  with  upper  limits  of  1.00.   In  most  cases  they 
should  vary  "between  zero  and  one. 


r1  =  l- 

cl 

P 
Z 

i=l 

1                   2 

Z      [<M>']T. 

J-l              1J 

r2-l- 

C2 

P 

1-1         r             ,2 

Z 

Z        [$$»]7 

i=2  j=l       1J 

These  measures  (c  ,  c  ,  r  ,  r  )  are  all  invariant  under  orthogonal  rotation 
of  the  sample  factor  matrix  A,  and  of  the  hypothesis  factor  matrix  $. 
All  six  measures  (including  p  and  p  )  were  obtained  for  all  96  sample 
correlation  matrices. 


11 

II.   METHOD 


A.   Data  Used 


Due  to  limitations  on  computer  time,  it  was  necessary  to  use  only 
some  of  the  population  correlation  matrices  from  the  Tucker,  Koopman,  and 
Linn  study.   In  order  to  preserve  the  effects  due  to  the  independent  vari- 
ables used  in  generating  those  matrices,  it  was  decided  to  randomly  select 
one  matrix  from  each  of  eight  cells  in  their  design.   The  eight  cells  were 
created  by  using  two  levels  of  each  of  the  three  independent  variables  used 
by  Tucker  et  al;  i.e.  model  (formal  vs  simulation),  level  of  B,  (high  vs  low), 
and  number  of  factors  in  the  major  domain  (3  vs  7)-   The  eight  matrices  used 
are  identified  in  Table  2.   The  level  of  battery  (l,  2,  or  3)  was  used  by 
Tucker,  Koopman,  and  Linn  to  designate  a  particular  correlation  matrix,  as 
they  had  three  such  matrices  in  each  cell  in  their  design.   In  the  current 
study,  one  battery  was  randomly  selected  from  each  of  the  eight  cells  of 
interest.   In  order  to  include  the  parameter  of  sample  size,  it  was  decided 
to  draw  samples  of  size  100,  UOO,  and  1600  from  each  population-correlation 
matrix.   To  achieve  some  stability  of  results,  four  sample  correlation 
matrices  were  drawn  from  each  population-correlation  matrix,  at  each  level 
of  sample  size,  yielding  96  sample  correlation  matrices. 

Table  2 

Number  of 

Battery 

2 

2 

3 

3  ■ 

3 

1 

1 

1 


Level 

Number  of 

Matrix 

of  bn  . 
lj 

Model 

Factors 

1 

high 

formal 

3 

2 

high 

formal 

7 

3 

high 

simulation 

3 

h 

high 

simulation 

7 

5 

low 

formal 

3 

6 

low 

formal 

7 

7 

low 

simulation 

3 

8 

low 

simulation 

7 
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B.   Generation  of  Sample  Correlation  Matrices 

The  intuitive  way  to  generate  sample  correlation  matrices  is  to 
generate  samples  of  random  variables  from  a  multivariate  normal  distribu- 
tion "with  a  specified  correlation  matrix  (Kaiser  and  Dickman,  1962)  and 
to  calculate  the  sample  correlation  matrices  directly  from  this  raw  data. 
However,  this  method  requires  a  large  quantity  of  random  numbers  and  a 
large  amount  of  computer  time,  especially  when  large  sample  sizes  are 
required.   To  avoid  this  problem,  a  more  economical  procedure,  described 
by  Odell  and  Feiveson  (1966)  and  used  by  Browne  (1968),  was  used  in  this 
study . 

In  order  to  compute  a  sample  correlation  matrix  R  when  given  the 
population  correlation  matrix  P,  one  uses 

R  =  (Diag  [A])"1'2  A  (Diag  [A])"1^2   and 

A  =  (fiT)(fiT)  '    where 

P  =  ffi'  and  the  elements  of  T( lower  triangular)  are  chosen  as 
independently  distributed  variables: 

t..  is  distributed  as  N(0,l)   (i  >  j) 

X  J 

t..  is  distributed  as  Chi  with  (N-i)  degrees  of  freedom 

t..  =  0    i<j 

For  convenience  of  calculation,  0,   was  chosen  to  be  lower  triangular  and 

was  obtained  by  the  square  root  method  for  triangular  factoring  (Dwyer,  19^5) 

Thus,  this  method  requires  only  the  generation  of   *   * random 

normal  deviates  and  p  (p  =  20,  the  number  of  variables)  random  Chi  variables 
for  each  sample  correlation  matrix,  regardless  of  the  sample  size.   Also  a 
large  amount  of  computational  time  is  saved  in  the  calculation  of  the 
correlation  matrix. 
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In  order  to  generate  the  random  normal  deviates  for  the  T  matrix,  it 
was  first  necessary  to  generate  random  integers  on  the  computer.   These 
integers  were  converted  to  real  numbers  uniformly  distributed  between 
zero  and  one,  and  then  these  were  normalized.   Unfortunately,  there  is  no 
way  to  pick  truly  random  numbers  on  the  computer,  so  the  random  integers 
needed  were  produced  by  a  simple  arithmetic  process.   These  random  integers 
are  often  called  pseudo  random  integers,  because  they  are  produced  by  a 
deterministic  process.   Richardson  (1969)  reviewed  several  methods  of 
generating  pseudo  random  integers  and  chose  the  multiplicative  congruential 
method  as  the  best  for  the  IBM  360,  on  the  basis  of  randomness  (passing 
statistical  tests),  length  of  period  (number  of  integers  generated  before 
the  sequence  repeats  itself),  and  generation  time  needed.   This  method  is 

based  on  the  relation  X.  _  =  aX.(mod  m)  which  means  that  aX.  is  divided  by 

st 
m  and  the  i+1   random  integer  X.    is  set  equal  to  the  remainder.  * 

Muller  (1959)  compared  several  methods  of  generating  pseudo  random 

normal  deviates  from  pseudo  random  numbers  on  the  interval  (0,  l).   The 

direct  approach  (Box  and  Muller,  1958)  was  picked  as  best  because  of  the 

resulting  reliability  in  the  tails  of  the  distribution  and  the  relatively 

greater  accuracy  when  compared  with  other  methods.   The  transformations  are: 

1  /? 
X_  =  (-2  log  U,  )  '      cos  2ttU0 

1  el  2 

X2  =   (-2  loge  ^)1/2   sin  2ttU2 


*  m ,  ,   „2U 


The  modulus  m  was  set  to  2   in  order  to  provide  the  maximum  possible 
period.   The  constant   a  was  chosen  by  Richardson  from  1500  different 
multipliers,  as  the  one  which  produced  the  integers  with  the  best  statistical 
properties.   Integers  on  the  IBM  360  occupy  32  binary  digits  (bits),  but 
real  numbers  use  only  2k   bits  (the  remaining  8  are  used  for  the  exponent). 
Thus,  the  pseudo  random  integers  were  converted  to  a  uniform  distribution 
by  merely  inserting  the  appropriate  exponent  in  the  first  eight  bits,  so 
that  the  real  numbers  would  lie  between  zero  and  one. 
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where  U  and  U  are  pseudo  random  numbers  from  the  interval  (0,  l),  and 
X  and  X  are  independent  variables  from  the  normal  distribution  with 
mean  zero  and  unit  variance  [N(0,  l)]. 

In  order  to  determine  the  pseudo  random  Chi  variables  for  the 
diagonals  of  the  T  matrix,  the  following  approximation  was  used 
(Abramowitz  and  Stegun,  1966) 


xP  ■  v[1  -  #  +  <xp  -  V  is ]3     (v  >  30) 

where  v   =  degrees  of  freedom  and  X  is  a  pseudo  random  normal  deviate.   The 

P 

value  for  h  is  gotten  from  the  relation  h  =  —  h^  where  h,-„  is  tabled 
v  v     v   60         60 

against  values  of  X  from  -3.5  to  +3.5  by  Abramowitz  and  Stegun.   A  cubic 
equation  was  used  to  interpolate  between  the  tabled  values  of  lvn. 

h.Cr,   =  -.000924X   -  .000159X2  +  .000308X3  +  . OOOI89 
60  p  p  p 

The  correlation  between  lv  and  h/-_  (for  the  15  tabled  values)  was  1.0000. 

C .  Factor  Analyses 

Each  correlation  matrix  was  factored  using  Joreskog's  (1967a) 
maximum-likelihood  factor  analysis  program.   The  maximum  number  of  iterations 
was  set  to  100  and  the  probability  of  chance  occurence  was  set  to  1.0  so 
that  all  solutions  were  obtained.   Solutions  were  obtained  for  the  number 
of  factors  in  the  major  domain.   Additionally,  the  likelihood  ratio  tests 
of  the  number  of  common  factors  were  obtained  from  zero  up  to  the  number  of 
factors  in  the  major  domain,  so  that   p  and  pp  could  be  calculated  for 
each  possible  number  of  factors.   The  coefficients  c  ,  cp,  r  ,  and  r  were 
calculated  for  each  factor  matrix. 

D.  Analysis  of  Variance 

In  order  to  determine  the  effects  of  the  four  independent  variables 
(model,  level  of  b   ,  number  of  factors,  and  number  of  observations)  on 
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the  measures  of  goodness  of  fit,  six  separate  fixed-factor  analyses  of 
variance  were  performed,  each  being  2x2x2x3- 
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III.   RESULTS 
A.   Reliability  Coefficients  p  and  p 

The  means  of  p   and  p  ,  across  the  four  samples  of  the  same  size  for 
each  population-correlation  matrix,  are  presented  in  Appendix  A.   There 
were  only  small  differences  between  the  two  coefficients.   For  very  large 
sample  sizes,  the  formulas  yield  quite  similar  results,  as  is  illustrated 
by  the  samples  of  size  1600.   The  correlation  between  p   and  p   for  the 
number  of  factors  in  the  major  domain,  across  all  96  sample  factorings 
was  .998.   All  results  are  discussed  mainly  in  terms  of  p  ,  as  it  is  the 
later,  published  version. 

p   was  an  excellent  measure  of  goodness  of  fit  for  the  factor  matrices 
obtained  from  samples  from  the  population-correlation  matrices  (hereafter 
called  sample  factor  matrices)  of  the  formal  model.   In  the  three  factor 

matrices,  with  high  or  low  b  .  (Table  3)   P9  was  very  close  to  1.00,  for 

2 
all  sample  sizes.   With  seven  factors  and  high  b  .  (Table  h)   the  results 

2 
were  as  good.   However,  with  seven  factors  but  low  b   ,   pp  went  above  1.00 

after  four  factors  with  only  100  observations.   The  average  value  of  p~ 

after  seven  factors  were  obtained  was  1.4285,  and  the  individual  values 

were  1.0956,  I.OU28,  1.4710,  and  2.1046.   This  value,  1.4285,  was  much 

larger  than  the  population  value  of  1.00  obtained  by  Tucker  and  Lewis  (1970) 

While  this  result  was  probably  due  to  the  small  sample  size  of  100,  it 

reflected  an  undesireable  property  for  a  reliability  coefficient.   However, 

with  400  and  1600  observations,  results  were  much  better.   Thus,  the  method 

of  maximum  likelihood  resulted  in  good  solutions  as  measured  by  p_  when  the 

populations  -  exactly  -  f Itted— the— factor  analyt  ic  _model .   The  only  exception 

was  with  variables  of  low  communality,  in  which  case  more  observations  were 

necessary  to  obtain  a  good  fit. 


IT 


Table  3 

Means  of  p?  for  Matrices  with  3  Factors  in  the  Major  Domain, 
After  3  Factors  Have  Been  Obtained 


Formal  Model        Simulation  Model 


Sample 
size 


100  1.0011  .8535 

>f\  1*00  1.0018  .8253 

J  1600  1.0001+  .8319 


ioo  .9929  .6157 

i+oo  1.0127  .5690 

1600  1.0023  .5623 


Table  h 

Means  of  p  for  Matrices  with  7  Factors  in  the  Major  Domain, 
After  7  Factors  Have  Been  Obtained 


Formal  Model        Simulation  Model 


Sample 
size 


100  1.0139  .731+8 

High  b    .  1+00  1.001U  .6829 


U 


l600 .9997 .7118 


100  1.1+285  .6022 

1+00  1.01+82  .5312 

1600  .9981+  .1+982 


Table  5 

Means  of  pp  for  Matrices  with  3  Factors  in  the  Major  Domain, 
After  1+  Factors  Have  Been  Obtained 

Formal  Model        Simulation  Model 
Sample 
size 

100  1.0116  .8778 

High  h    .  1+00  1.0051+  .81+22 


lj 


1600  1.0011  .81+1+2 


100  1.01+16  .671+3 

Low  h    .  1+00  1.0257  .6029 


lj 


1600  1.0050  .6108 
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Results  from  the  simulation  model  matrices  were  not  nearly  as  good. 

In  no  case  did  pp  reach  the  value  of  1.00.   The  largest  values  were 

2 
obtained  with  high  b  .  and  three  factors  (Table  3),  but  the  highest  was 

.8535.   There  was  a  trend  for  p  to  decrease  with  increased  sample  size 

for  simulation  model  matrices  with  low  b  .  (Tables  3  and  k) .      p  also 

became  smaller  as  the  number  of  factors  in  the  major  domain  increased  and 

as  b.,  .  decreased. 

The  calculation  of  p_  was  extended  to  four  factors  in  the  three  factor 

matrices,  in  order  to  see  how  it  behaved.   It  was  thought  that  there  might 

be  some  leveling  off,  after  three  factors.   This  did  occur  for  the  formal 

model  (Table  5),  after  p  had  already  reached  1.000.   There  was  some  tend- 

2 

ency  for  the  values  of  p  to  level  off  for  the  simulation  model,  high  b  ., 

as  the  increase  from  two  factors  to  three  factors  was  much  greater  than 

that  from  three  factors  to  four  factors  (Appendix  A) .   In  the  simulation 

2 
model  with  low  b  . ,  there  were  no  signs  of  a  leveling  off  of  p  after 

three  factors. 

B.   Other  Goodness  of  Fit  Measures 


The 


results  for  c   and  c      (Table  6)  were  very  similar,  as  were'  the 


results  for  r  and  r  .   To  get  an  idea  of  the  degree  of  similarity,  the 
coefficients  were  correlated  across  all  96  matrices.   Since  c   correlated 
.968  with  c  ,  and  r  correlated  .99^-  with  r^,  results  will  be  discussed 
in  terms  of  c   and  r  only.   The  coefficient  c  behaved  exactly  as  expected. 
For  all  eight  matrices,  c  got  smaller  as  the  sample  size  increased.   In 

all  cases,  increasing  the  number  of  factors,  while  holding  model,  level  of 

2 
b  .,  and  sample  size  fixed,  caused  an  increase  in  c  .   In  all  cases,  moving 

from  the  formal  model  to  the  simulation  model  while  holding  the  other  three 

independent  variables  fixed  caused  an  increase  in  c.  .   Finally,  in  all 


19 


Table  6 

Means  of  the  Coefficients  c  ,  cQ,  r  ,  and  r  ,  After  the  Number 
of  Factors  in  the  Major  Domain  Have  Been  Extracted 


Matrix         Sample  c         c  r  r 

size 


1 

100 
1+00 

1600 

.9587 
.21+72 
.0U61+ 

.9050 
.2296 
.01+26 

.9782 
.99!+!+ 
•  9990 

•  9737 
.9933 
.9988 

2 

100 
1+00 

1600 

1.6355 
.1+625 

.1035 

1.1+322 
.1+336 

.091+3 

.91+13 
.9831+ 
.9963 

.9201 
•  9758 
.991+7 

3 

100 
1+00 

1600 

1.1+689 
.5290 
.3523 

1.3211+ 
.U51+7 
.2865 

•  9705 
.9891+ 
.9929 

.9671 
.9887 
•  9929 

k 

100 
1+00 

1600 

2.5336 
.9252 
.6677 

2.11+20 
.61+90 
.^593 

.8915 
.960^ 
•  971^ 

.81+03 
.9516 
.9657 

5 

100 
1+00 

1600 

1.5298 

.3131 

.071+9 

1.2331 
.2732 
.0623 

.85I+I 
.9701 
.9929 

.8520 
.9672 
.9925 

6 

100 

1+00 

1600 

3.7076 
.7221 
.2576 

1.8881 
•  385U 
.1078 

.1256 
.8297 
.9392 

.21+23 
.81+53 
.9567 

7 

100 
1+00 

1600 

5.0938 
2.9050 
2.31+7!+ 

1+.1732 
2 . 0l6l 
1.79l!+ 

.5351 
.731+9 
.7858 

.5258 

•  7709 

•  7961+ 

8 

100 
1+00 

1600 

6.783U 
5.1870 
U.7623 

1+.1999 
3.0895 
2.71^3 

-.5998 
-.2233 
-.1231 

-.685I+ 
-.2398 
-.0893 
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Table  7 

Standard  Deviations  of  the  Coefficients  c 

Number  of  Factors  in  the  Major  Domain  Have  Been  Extracted 


,  c  ,  r  ,  and  r  ,  After  the 


Matrix 


Sample 
size 


1 

100 
1+00 

1600 

.2668 
.0696 
.0066 

.2555 
.0632 
.0071 

.0061 
.0016 

.0001 

.0071+ 
.0018 

.0002 

2 

100 

1+00 

1600 

.1+390 

.1100 
.0230 

.5500 

.1057 
.021+0 

.0177 

.0039 

.0008 

.0307 
.0059 

.0013 

3 

100 

1+00 

1600 

.9731+ 
.01+16 
.053^ 

.9287 
.01+59 

.0U57 

.0196 
.0008 

.0011 

.0231 
.0011 
.0011 

k 

100 
1+00 

1600 

.1+661 
.1811+ 
.1701+ 

.2975 
.1938 
.1382 

.0200 

.0078 
.0073 

.0222 
.011+8 
.0103 

5 

100 
1+00 

1600 

.2350 
.1363 
.0130 

.0933 
.1325 
.0061 

.0221+ 
.0130 
.0012 

.0112 
.0159 
.0007 

6 

100 

1+00 

1600 

.21+19 
.1819 

.13^8 

.1978 
.051+6 

.0111 

.0571 
.01+29 
.0318 

.079!+ 
.0219 
.001+5 

7 

100 
1+00 

1600 

2.329U 
.1106 
.^228 

2.2251+ 
.0656 
.1850 

.2126 

.0101 

.0386 

.2529 
.0075 
.0210 

8 

100 
1+00 

1600 

.6732 

.61+1+9 

.3716 

.5566 

.6975 
.1726 

.1503 
.1521 
.0876 

.2231+ 
•  2799 
.0692 
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2 
cases,  c   increased  when  the  b   went  from  high  to  low.   (Note,  there  was 

one  reversal  of  this  last  finding  with  c  ,  which  decreased  as  the  level  of 

2 
b   went  from  high  to  low  from  matrix  2  to  matrix  6.) 
-*-  J 

Also,  r   increased  with  increased  sample  size,  increased  values  of 

2 
b, .,  fewer  factors,  and  from  the  simulation  model  to  the  formal  model. 
lj 

2 
For  high  b  .,  formal  model,  the  values  of  r  were  good  for  three  factors, 
■^  J  -*- 

all  sample  sizes,  while  seven  factors  required  a  sample  size  of  U00  for  a 

2 
satisfactory  result.   With  the  simulation  model,  low  b  .,  and  three  factors, 

r  reached  only  .796^  with  1600  observations.   In  matrix  8  (simulation 

2 
model,  low  b  .,  seven  factors),  the  values  of  r  were  actually  negative. 

_L  J  J- 

This  was  partly  due  to  the  low  total  sum  of  squares  in  the  model  correlation 
matrix,  but  this  result  indicated,  much  better  than  did  p  ,  the  inaccuracy 
of  these  solutions. 

The  standard  deviations  of  the  coefficients  c  ,  c  ,  r  ,  and  r  were 
calculated  (Table  7).   There  was  a  tendency  for  the  standard  deviations 
to  be  smaller  with  better  fit,  but  there  were  more  reversals  than  with 
the  means.   Also,  since  r  and  r  had  upper  limits  of  1.0,  their  standard 
deviations  were  forced  to  decrease  as  the  means  increased  because  the 
upper  limit  was  being  approached. 

C .   Use  of  the  Likelihood-Ratio  Test 

Joreskog  (1967b)  used  the  likelihood-ratio  technique  to  test  the 
hypothesis  that  the  number  of  factors  m  was  a  given  number.   The  exact 

distribution  of  the  likelihood-ratio  test  statistic  is  not  known,  but  for 

2 
large  N  its  distribution  is  approximately  a  x  distribution  with  degrees 

1         2 
of  freedom  —  [(p  -  m)   -  (p  +  m)].   If  the  hypothesis  of  m  factors  was 

rejected  (due  to  a  statistically  significant  value  of  the  test  statistic), 

Joreskog  refactored  the  matrix  for  m+1  factors.   It  was  thought  that  by 
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Table  8 

Range  of  Probability  Levels  for  x2  Statistics 
3  Factor  Matrices 


Matrix 

Sample 
size 

2  Factors 
Min .   Max . 

3  Factors 

Min .   Max . 

7  Factors 

Min.   Max. 

1 

100 
^00 

1600 

.0000 

.0 

.0 

.0000 

.0 

.0 

.1173 
.2697 
.2028 

.8533 
.9211 
.8257 

.9890 
.9820 
•  9776 

.9976 

1.0000 

.9968 

3 

100 
1+00 

1600 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 
.0000 
.0 

.0191+ 

.0000 
.0 

5 

100 
1+00 

1600 

.0066 

.0000 

.0 

.6501+ 

.0000 
.0 

.2060 

.1201 
.5070 

.8027 
.9712 
.9186 

.9376 
.8897 
.9616 

.9960 
.9982 
.9923 

7 

100 
1+00 

1600 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0007 

.0000 
.0 

.0051+ 

.0000 
.0 

7  Factor  Matrices 


1+  Factors 
Min .   Max . 

6  Factors 
Min .   Max . 

7  Factors 
Min .   Max . 

100 

1+00 
1600 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 
.0000 
.0 

.0683 

.0000 
.0 

.5381 
.2903 
.201+3 

.81+65 

.791+8 
.7330 

100 

1+00 
1600 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 

.0 

.0 

100 

1+00 
1600 

.0723 

.0000 
.0000 

.9560 

.0000 
.0000 

•  5019 
.081+0 
.0005 

.9995 
.6060 

.0112 

.6065 
.7133 
.2279 

.9998 
.9681+ 
.5371 

100 

1+00 
1600 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 

.0 

.0 

.0000 
.0000 
.0 

.0000 
.0000 
.0 

.0000 
.0000 
.0 

Note:  In  the  above  table,  the  entry  .0000  means  that  the  number  "was  a  zero, 
when  rounded  to  1+  decimal  places.  The  entry  .0  "was  an  exact  zero,  to  the 
accuracy  of  the  computations  (about  7  decimal  places). 
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looking  at  the  probability  levels  (probabilities  of  the  chance  occurence 

2 
of  the  observed  x  values)  of  the  test  statistic  for  various  numbers  of 

factors,  one  might  be  able  to  determine  the  correct  number  of  factors. 

In  Table  8  are  presented  the  ranges  of  these  probability  levels,  for 

selected  numbers  of  factors.   For  the  formal  model  (matrices  1  and  3  for 

three  factors,  and  matrices  2  and  6  for  seven  factors),  in  all  cases  one 

would  accept  (at  any  reasonable  probability  level  from  .001  to  .100)  the 

hypothesis  of  the  number  of  factors  in  the  major  domain.   The  probability 

2 
levels  ranged  from  a  low  of  .1173  for  one  high  b  . ,  three  factor  matrix 

-*-  J 
2 
to  a  high  of  .9998  for  a  low  b  .,  seven  factor  matrix.   However,  in  one 

2 
case,  with  low  b  .,  three  factors,  and  a  sample  size  of  100,  the  hypothesis 

of  two  factors  was  also  accepted  (p  =  ,6^0h) .      With  seven  factors,  and  low 

2 
b  .  (matrix  6),  an  hypothesis  of  only  four  factors  was  supported  with  100 

observations  and  an  hypothesis  of  six  factors  was  supported  with  ^00 

observations . 

For  all  simulation  model  matrices,  however,  the  hypothesis  that 

the  number  of  factors  was  equal  to  the  number  of  factors  in  the  major 

domain,  was  rejected.   Even  the  hypothesis  of  seven  factors  for  a  sample 

2 
factor  matrix  with  high  b  .  and  only  three  factors  in  the  major  domain 

-'-J 

was  rejected  (although  one  matrix  of  sample  size  100  did  have  a  p  =  .019^- 

which  would  not  have  been  rejected  at  the  .01  level).   Thus,  this  test 

is  appropriate  for  testing  the  hypothesis  that  the  factor  analytic  model 
V  f 
\y,   Ihblds  exactly  in  the  data,  but  it  is  of  no  use  as  a  measure  of  goodness 

sf  fit  for  data  that  do  not  fit  the  model. 

D.   Analyses  of  Variance 

Separate  analysis  of  variance  summary  tables  for  the  six  measures 
p, ,  p„,  c  ,  c_,  r  ,  and  r   are  presented  in  Appendix  B.   These  analyses 


o 


2^ 


were  performed  in  order  to  discover  the  relative  sizes  of  the  effects  of 

2 
the  four  independent  variables,  level  of  b   ,  model,  number  of  factors  in 

the  major  domain,  and  sample  size.   Since  the  assumption  of  normality  of 

analysis  of  variance  was  possibly  violated,  especially  with  p  ,  p  ,  r,  ,  and 

r  ,  border  line  significant  F  ratios  should  not  be  taken  too  seriously. 

The  results  for  p   and  p_  were  again  very  similar,  so  results  are 

discussed  in  terms  of  pp.   The  main  effect  of  model  accounted  for  61.19% 

of  the  total  sum  of  squares  for  p  .   The  average  value  of  p   for  the 

formal  model  was  1.0^2,  while  for  the  simulation  model,  it  was  only  .668. 

The  only  other  large  contributor  to  the  total  sum  of  squares  (except  for 

within  cell)  was  the  interaction  between  model  and  level  of  b   ,  which 

accounted  for  9.07%  of  the  total  sum  of  squares. 


formal 


simulation 


high  b 


2 
U 


low  b 


1J 


1.003 

•  773 

1.080 

.563 

Four  other  small  but  statistically  significant  effects  were  also 

found.   The  average  value  of  p  was  .888  for  sample  factor  matrices  with 

2  2 

high  b  .  and  .822  for  those  with  low  b  ..   There  was  also  a  significant 

trend  for  p  to  decrease  with  increased  sample  size.   The  averages  were 

.905,  .83^,  and  .826,  for  sample  sizes  100,  U00,  and  1600,  respectively. 

The  BXF  and  MXF  interactions  were  also  significant  at  the  .01  level. 


3  Factors   7  Factors 


high  b 


low  b 


lj 
2 

lj 


.919 

.857 

.792 

.851 

formal 
simulation 


3  Factors 

7  Factors 

1.002 

1.082 

.710 

.627 

Although  the  results  were  similar  for  c  and  cp,  only  6.68%  of  the 
total  sum  of  squares  was  attributable  to  error  for  c  ,  whereas  13.02%  was 
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error  for  c_.   This  supported  the  earlier  decision  to  discuss  results  in 

terms  of  c   only,  as  it  was  less  variable,  within  cells.   The  main  effect 

of  level  of  b.,  .  accounted  for  25.17%  of  the  total  sum  of  squares.   The 
1J 

2  2 

mean  value  of  c  was  .827  for  high  b  .  and  2.807  for  low  b  ..   Model 

accounted  for  2k. 62%   of  the  total  sum  of  squares,  and  the  mean  for  the 

formal  model  was  .838  while  for  the  simulation  model,  it  was  2.796. 

Sample  size  accounted  for  YJ  .21%   of  the  total  sum  of  squares,  with  c 

dropping  as  sample  size  increased.   The  means  were  2.96U,  l.Ull,  and 

1.077  for  the  sample  sizes  100,  U00,  and  1600,  respectively.   The  level 

2 
of  b  .  by  model  interaction  accounted  for  13.58%  of  the  total  sum  of 

-'-J 

squares . 


formal 


simulation 


high  b 


low  b 


2 

u 


.576 

1.079 

1.101 

1+.513 

The  main  effect  of  numbers  of  factors  in  the  major  domain  (c   =  1.322  for 
three  factors,  c  =  2.312  for  seven  factors)  and  the  three  interactions 
shown  below  were  also  significant  at  the  .01  level. 


high  b]_ . 


low  b 


lj 


3  Factors 

7  Factors 

.600 

1.055 

2.0UU 

3.570 

high  b 


low  b 


2 

lj 
2 

lj 


100 

Uoo 

1600 

1.6U9 

.5U1 

.292 

U.279 

2.282 

1.861 

formal 
simulation 


3  Factors 

7  Factors 

.528 

1.1U8 

2.116 

3.U77 

Unfortunately,  due  to  negative  values  for  low  b  .,  simulation  model, 
and  7  factors  (summed  across  sample  size),  every  main  effect  and  interaction 
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which  did  not  involve  sample  size  was  highly  significant  for  r  and  r  . 
While  the  negative  values  (r  =  -.338,  r  =  -.315)  indicated  how  poor 
results  were  for  that  combination,  the  effect  was  apparently  strong 
enough  to  influence  most  other  effects.   The  main  effect  of  sample  size 
did  account  for  ^ .lk%   of  the  total  sum  of  squares,  with  r   increasing 
as  sample  size  increased.  The  means  .587,  -780,  and  .819  for  100,  U00, 
and  l600  observations,  respectively, 
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IV.   DISCUSSION 

A  major  goal  of  the  present  study  was  to  find  or  develop  a  measure 
of  goodness  of  fit  for  the  factor  analytic  model.   One  such  measure 
studied  was  p  .  6l.l9%   of  the  sum  of  squares  of  p  was  accounted  for 
by  the  main  effect  of  model.   For  samples  from  population-correlation 

matrices  constructed  to  exactly  fit  the  factor  analytic  model,  p  worked 

2 
exceedingly  well.   Only  in  the  case  of  seven  factors  and  low  b  .  was  it 

^-  J 

necessary  to  have  a  sample  size  greater  than  100.   p   also  had  the 

desirable  property  of  approaching  unity  (or  nearly  so)  as  the  sample 

size  increased,  for  the  matrices  developed  from  the  formal  model.   The 

samples  from  the  simulation  model  behaved  quite  differently.   Even  in  the 

best  case  (three  factors,  high  b   ),  the  average  value  of  p  was  .8535. 

-*-  J  c— 

Thus,  in  all  cases,  pp  reflected  the  presence  of  the  minor  factors. 

There  was  also  a  significant  decrease  in  p   for  sample  sizes  400  and 
1600,  when  compared  with  100.   This  is  not  a  good  property  for  a  proposed 
measure  of  goodness  of  fit,  as  intuitively  one  would  expect  the  fit  to  a 
good  model  to  improve  with  more  observations.   However,  this  decrease  is 
due  in  part  to  p   coming  down  to  1.000,  after  going  over  that  value  for 
samples  of  100.   There  was  a  significant  decrease  in  c  with  increased 
correlations  implied  by  the  model  better  with  more  observations.   This 
was  further  illustrated  by  the  fact  that  the  p  values  were  approaching 
the  population  values  obtained  by  Tucker  and  Lewis.   This  can  be  seen  by 
subtracting  the  population  values  (Table  l)  from  the  sample  values 
(Tables  3  and  k) . 
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Ta"ble  9 


Sample  Size 

1+00 

1600 

.0018 
.001-1+ 

.0001+ 
-.0003 

-.001+7 
-.0271 

.0019 
.0018 

.0127 
.0U82 

.0023 
-.0016 

.0190 

.0512 

.0123 
.0182 

Differences  Between  Sample  Values  and  Population 

Values  of  p 


100 

2 
High  b  .,  Formal  Model,     3  Factors     .0011 

3  7  Factors    .0139 

2 
High  b  .,  Simulation  Model,  3  Factors     .0235 

0  7  Factors     .021+8 

2 
Low  b   ,  Formal  Model,     3  Factors   -.0071 

J  7  Factors    .1+285 

2 
Low  b  .,  Simulation  Model,  3  Factors     .0657 

J  7  Factors     .1222 

Since  these  values  (Table  9)  are  only  accurate  to  two  decimal  places  (as 

2 
the  Tucker  and  Lewis  figures  are  to  two  places),  all  except  the  low  b  ., 

simulation  model  matrices  were  within  rounding  error  of  the  population 

values  for  samples  of  size  1600.   Thus,  the  decreases  in  p  with 

increasing  sample  sizes  were  toward  the  population  values. 

An  important  result  was  pointed  out  by  the  significant  interactions 

2 
between  level  of  b  .  and  the  model  for  p   and  c  .   In  both  cases,  the 

2 
results  in  the  simulation  model,  low  b.,  .  cell  were  much  poorer  than  would 

lj 

have  been  predicted  from  the  main  effects  alone.   These  results,  an  average 
p   of  .563  and  an  average  c   of  1+.513  (over  four  times  greater  than  the 
next  largest  cell),  showed  that  one  cannot  expect  to  support  one's  hypothesis 
with  variables  that  have  low  percentage  of  variance  accounted  for  in  the 

major  factors.   It  was  interesting  to  note  that  while  the  values  of  c  were 

2  2 

about  the  same  in  the  two  cells  high  bn  .,  simulation  model  and  low  bn  ., 

lj  lj 

formal  model,  the  former  had  p   =  .773  and  the  latter  had  p   =  1.080. 

Thus  the  model  correlations  were  reproduced  as  well  for  simulation  model, 

2  2 

high  b.,  .  as  for  formal  model,  low  bn  .. 
lj  lj 
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The  fact  that  level  of  b   {23.11%)   accounted  for  as  large  a  per- 

-*- J 

centage  of  the  total  sum  of  squares  of  c   as  did  model  (2k. 62%)   was 
encouraging.   Thus,  if  the  simulation  model  is  a  better  model  of  the 
world,  it  is  still  possible  for  an  experimenter  to  improve  his  results 
by  constructing  measures  with  high  proportions  of  variance  accounted  for 

by  the  major  factors. 

2 
The  x   statistic  was  useful  for  sample  factor  matrices  for  the 

formal  model  only.   Even  then,  it  lead  to  the  acceptance  of  too  few  factors 

2 
in  some  cases,  with  low  b  .  and/or  too  few  observations. 

The  measure  r  did  well  for  the  formal  model  matrices,  although 

more  observations  were  necessary  before  it  neared  its  maximum  of  1.00. 

2 
Also,  with  seven  factors,  low  b  .  and  1600  observations  it  only  attaimed 

2 
•9392.   However,  the  simulation  model  matrices  with  high  b  .  also  gave 

-'-J 

high  values  of  r  .   Thus  the  maximum-likelihood  estimation  procedure  was 

doing  a  good  job  of  reproducing  the  model  correlations,  but  this  was  not 

2 
reflected  in  p  .   The  results  on  matrices  7  and  8  (simulation  model,  low  b  .. 

confirmed  the  importance  of  controlling  the  relationship  between  the  major 

and  minor  influences  on  one's  results.   The  major  factors  should  predominate 

over  minor  factors  in  any  study.   The  results  did  indicate  that  it  is 

easier  to  reproduce  a  small  number  of  factors  in  a  poorly  designed  study. 

Thus  r  was  shown  to  be  useful  as  a  measure  of  goodness  of  fit.   It 

does  require  the  writing  down  of  an  hypothesized  factor  matrix  0 ,  so  it 

can  not  normally  be  used  in  exploratory  studies.   It  has  the  advantage 

that  it  could  be  used  with  any  factor  analytic  procedure.   On  the  other 

hand,  p   could  be  used  with  any  study,  as  long  as  the  maximum-likelihood 

estimation  procedure  is  used.   However,  it  seems  somewhat  less  useful  than 

r  when  the  factor  analytic  model  does  not  hold  in  the  population. 
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An  investigator  should  strive  to  develop  variables  "which  strongly 
represent  his  major  factors.   He  should  have  a  large  sample  size  (a  ratio 

of  five  observations  per  variable  was  not  always  sufficient  to  insure 

2 
good  results,  even  with  high  b  .  and  the  formal  model).   If  he  uses 

"*-  J 

maximum-likelihood  factor  analysis,  he  should  usually  stop  factoring 

2 
when  the  X  becomes  non-significant,  if  his  sample  size  is  large  enough. 

However,  with  real  data,  there  may  be  statistically  significant  minor 

2 
factors  which  are  not  of  interest.   In  this  case  the  x  cannot  give  an 

indication  of  when  to  stop  factoring.   However,  the  investigator  can  use 

p  as  an  estimate  of  how  well  the  formal  factor  analytic  model  holds  in 

the  population  from  which  his  data  was  taken.   Although  the  statistical 

properties  of  this  estimate  are  not  known,  and  it  may  be  high  for  small 

samples,  it  does  have  the  desirable  property  of  having  a  value  of  1.00 

in  populations  which  exactly  fit  the  formal  factor  analytic  model. 

Small  values  of  p  probably  indicate  a  poorly  controlled  study,  and 

the  investigator  may  be  able  to  improve  his  results  by  using  better 

controls  over  minor  factors,  by  having  variables  with  high  percents  of 

variance  in  the  major  domain,  and  by  having  a  higher  ratio  of  variables 

to  major  factors.   Finally,  in  a  confirmatory  study  he  should  write 

down  an  hypothesized  factor  matrix  and  use  c   and  r  to  determine  its 

goodness  of  fit. 
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APPENDIX     A 


MEANS   OF   p     AND   p 


Matrix 


3>* 


Ta"ble  10 


Means  of  p  for  3  Factor  Matrices 


Sample 
size 


1  Factor 


2  Factors 


3  Factors 


1+  Factors 


1 

100 

i+oo 
1600 

.1+321 
.4385 
.1+323 

•  771+1 
.71+71 
.7575 

1 . 0011 
1 . 0017 
1.0001+ 

1.0135 
I.005I+ 
1 . 0011 

3 

100 

1+00 
1600 

.1+776 
.1+6U8 
.1+679 

.6626 
.61+58 
.6522 

•  8570 
.8262 
.8321 

.8818 
.81+32 
.81+1+1+ 

5 

100 

i+oo 
1600 

.6739 
.658I+ 
.6581 

.8879 
.8639 
.861+1 

.9932 
1.0126 
1.0023 

1.0397 
1.0255 
1.0050 

7 

100 
1+00 

1600 

.1+719 
.1+1+13 

.1+577 

.5591 

•  5205 

•  5177 

.626^ 
.5711+ 
.5629 

.6863 
.6058 
.6112 

Table  11 


Matrix 


Sample 
size 


Means  of  p   for  3  Factor  Matrices 


1  Factor 


2  Factors 


3  Factors 


1+  Factors 


1 

100 

1+00 
1600 

.1+273 

.1+375 
.1+320 

.7703 
.71+62 
•  7573 

1.0011 

1.0018 
1.0001+ 

1 . 0116 
1.0051+ 

1 . 0011 

3 

100 

1+00 
1600 

.1+731+ 
.1+639 
.U676 

.6571 
.61+1+6 

.6519 

.8535 
.8253 
.8319 

.8778 

.81+22 
.81+1+2 

5 

100 
1+00 

1600 

.6702 
.6578 
.6579 

.8852 
.8631+ 
.8655 

.9929 
1 . 0127 
1.0023 

1.0l+l6 
1.0257 
1.0050 

7 

100 

1+00 
1600 

.1+670 
.1+1+03 
.1+571+ 

•  5509 
.5187 
.5173 

.6157 
.5690 
.5623 

.671+3 
.6029 
.6108 
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